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Abstract 

Consider n players whose "scores" are independent and identically 
distributed values {^Q}™ =1 from some discrete distribution F. We pay 
special attention to the cases where (i) F is geometric with parameter 
p — > and (ii) F is uniform on {1, 2, ... , N}; the latter case clearly cor- 
responds to the classical occupancy problem. The quantities of inter- 
est to us are, first, the [/-statistic W which counts the number of "ties" 
between pairs second, the univariate statistic Y r , which counts the 
number of strict r-way ties between contestants, i.e., episodes of the 
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form Xn = Xi 2 = . . . = Xi r \ Xj / Xu;j ^ i\, 12, ■ ■ ■ , i r ] and, last but 
not least, the multivariate vector Zab = (Ya, Ya+i, ■ ■ ■ > Yg). We pro- 
vide Poisson approximations for the distributions of W, Y r and Zab 
under some general conditions. New results on the joint distribution 
of cell counts in the occupancy problem are derived as a corollary. 

1 Introduction 

In this paper we hope to shed new light on an old problem, studied extensively 
in, e.g., [6], [TT] . Consider n players whose "scores" are independent and 
identically distributed values {^Q}™ =1 from some discrete distribution F. We 
consider the case of general distributions F but pay special attention to the 
cases where (i) F is geometric with parameter p — > and (ii) F is uniform on 
{1,2,..., N}; the latter case corresponds to the classical occupancy problem. 
The quantities of interest to us are 

• the [/-statistic W which counts the number of "ties" between pairs i,j 
(with X a = Xb = X c = Xj, for example, leading to a contribution of 
(2) = 6 to the value of W)\ 

• the univariate statistic Y r which counts the number of strict r-way ties 
between contestants, i.e., episodes of the form Xj = x for some x iff 
i 6 A, \A\ = r; and 

• the multivariate vector Zab = (Xa, Y~a+i, ■ ■ ■ , Yb)- 

We provide Poisson approximations for the distributions of W, Y r and Zab 
under some general conditions. New results on the joint distribution of cell 
counts in the occupancy problem are derived as a corollary. 

Consider the following elementary problem from [13] : "Two players use 
a coin that lands heads with probability p to play a game that consists of 
a sequence of rounds. In each round, the first player tosses the coin until a 
head appears. Then the second player tosses the coin until a head appears. 
If the players have the same number of flips in a round, the round is declared 
a tie and another round is played. If not, the player with the larger number 
of flips wins the game. Rounds are played successively until one of the two 
players wins the game." Readers are asked to find the expected number of 
rounds; the expected value of the total number of flips; and the probability 
distribution of the difference between the number of flips made by players 1 
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and 2 in a given round. We briefly mention the solution for the first two of 
these questions: The probability of a two person tie is clearly 



oo 



£(i-p) 2 *-y=2f 

2 = 1 



so that E(i2), the expected number of rounds is given by 



oo OO , \ x—1 / \ 

(R) = ^a;(P(tie)r 1 (l-P(tie)) = ^^^-j (l - JL- J 



2 -p 

2 - 2p' 



so that Wald's lemma yields for E(F), the expected total number of flips, 



since the expected number K(F/R) of flips per round is clearly 2/ p. Compu- 
tations for a three-person game, not mentioned in [T3], are similar, but we 
need to lay down some rules as follows: Three players each flip a p-coin until 
heads is flipped. The player with the highest number of flips wins unless there 
are ties between two or more players, in which case we repeat the process. 
That is, the value of each of the three geometric variables in question must 
be unique. We next compute the probability of a two- or three-way tie; the 
expected number of rounds; and the expected number of flips for n = 3 - to 
convince the reader that the situation rapidly becomes quite complicated as 
n increases. [The authors had a lively discussion with Lloyd Douglas, NSF 
Program Officer, about the following "real-life" application of the n-person 
model with p — > 0. We wish to rank n of the greatest free-throw shooters 
(or slam dunkers, or,...) in the National Basketball Association. The players 
each shoot free throws until they miss - conditional on the fact that no two 
players miss on the same attempt. Rankings are then awarded in the obvious 
fashion.] 

With three players (A,B,C), there are 3!=6 ways to have a strict inequality 
and seven ways to tie, since there is one way for a three way tie (which we 
loosely write as "A = B = C") to occur; (f\ = 3 ways for A > B = C to 
occur; and ( 3 ) = 3 ways for A = B > C to occur. Note that 



E(F) = E(F/R)E(R) 



2-p 
2-2p 



E(F/R) 



2(2 -p) 
p(2-2pY 



F(A = B = C) 



p 3 + p 3 (l - p) 3 + p 3 (l - pf . . . 
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= 5^p 3 (l-p) 

x=l 

= p2 

3 — 3p + p 2 ' 



while the table below 

Case A B C 

1 TH, TTH, ... H H 

2 TTH, TTTH, ... TH TH 

3 TTTH, TTTTH, . . . TTH TTH 

4 TTTTH, TTTTTH, . . . TTTH TTTH 

reveals that 

oo m— 1 /-. \ 

F(A > B = C) = - V Y E(l - VT = ,_ ~l\r 

m=l i=0 

Finally, we observe from the table 

Case A B C 

1 TH TH H 

2 TTH TTH H, TH 

3 TTTH TTTH H, TH, TTH 

4 TTTTH TTTTH H, TH, TTH, TTTH 



that 



which leads to 

P(tie) = ¥(A = B = C) + 3¥(A > B = C) + 3P(A = B > C) 
5p 3 — 13p 2 + 9p 
{2 - p){3 - 3p + p 2 Y 

and hence as before to 

^( R ) = ' 5p 3_ 13p 2 +9 

(2-p)(3-3p+p^) 
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and 



E(F) = E(F/R)E(R) 



V \1 



(2-p)(3-3p+p 2 ) 



5p 3 — 13p 2 +9p 
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Competitions of the kind discussed above are best formulated in the more 
general context of occupancy models as follows: n balls are independently 
thrown into an infinite array of boxes so that any ball hits the jth box 
with probability pj. Let Xj be the number of balls in box j. Then, with 
Pj = (1 — pY p, we have the game inspired by [13] ending iff Xj < 1 Vj. 
Extremal versions of such questions have arisen in the literature before, often 
with surprising results. Motivated by a question, posed by Carl Pomerance 
and arising in an additive number theory context, Athreya and Fidkowski 
[2] proved that the probability n n that the highest numbered non-empty box 
has exactly one ball in it converges to a constant (which is shown to be 
one) iff lirn n _ f00 p n / YlJL n Pj — 0. This is a condition that is not satisfied 
by, e.g., the sequence p n = l/2 n for which, quite interestingly, the limit 
superior and the limit inferior of the sequence 7r n differ in the fourth decimal 
place. These results had been independently obtained a few years earlier by 
Eisenberg et. al [7], [9], [TO] and also by Bruss and O'Cinneide [Hj. The 
comprehensive paper of Mori [12] is most relevant too: Here it is proven 
that given a double sequence of integer valued random variables, i.i.d. within 
rows, and letting fi{n) denote the multiplicity of the maximal value in the 
nth row, the limiting distribution of fi(n) does not exist in the ordinary sense 
- but that the intriguing empirical type a.s. limit result 



holds, where r is a parameter that depends on the distribution. The whole 
field appears to be extraordinarily rich with known facts and tantalizing 
possibilities. 

Results of the kind described above indicate that the cell counts for the 
n person (Geometric) coin game are unlikely to behave in an asymptotically 
smooth way if p = p n -/->■ 0. This fact is borne out in Theorem 1, where we 
study the distribution of the number W of pairs of equalities in the n person 
game, with W = corresponding to the end of a "round" in the sense of (IB] , 
and show that a good Poisson approximation is obtained if np — > (Geomet- 
ric distribution) or n/N — > (Uniform distribution). Theorem 2 concerns 
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itself with the distribution of the number Y r of strict r-way ties (=the number 
of boxes with exactly r balls) and Theorem 3 with a multivariate generaliza- 
tion of Theorem 2. The approximating distribution is Poisson (Theorem 2) 
or a product of independent Poisson variates (Theorem 3). We note, more- 
over, that we were able to prove a result such as Theorem 3 relatively easily 
probably due to the approach taken - we use as a counter the event that r 
specific balls go into the same urn, rather than the conventional approach 
(e.g., [6], Section 6.2) of counting the number of urns with r balls. See also 

m, mi, m,0, and @. 



2 Results 

Theorem 1 Let {X,}™ =1 be an integer valued sequence ofi.i.d. random 
ables with P(Xi — i) = Pi, and consider the U-statistic 

CO 



vari- 



K=l 



where, with K, denoting the Kth 2-subset of {1, 2, . . . , n}, Ik = 1 if Xi = 
Xj]i,j G fC (Ik = otherwise). For any two discrete random variables T 
and U, let c2tv(£(T), £{U)) denote the usual total variation distance between 
their distributions C(T) and C{U), i.e., 

d TV (C(T),C(U)) = sup |P(T G A) - ¥(U G A)\, 

ACZ+ 

and let Po(A) be the Poisson r.v. with parameter A = E(W / ). Then 

<hv(C(W), Po(A)) < 2mr + 

71 

where vr = F(X 1 = X 2 ) and p = F(X 1 = X 2 = X 3 ). 

Proof The proof is an elementary application of, e.g., Theorem 2.C.5 in [6], 
which yields with A = E(W) = 

d TY (C(W), Po(X) 
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< ir + 2( " - 2)P + 2(n - 2)vr 

7T 

2np 

< 2n7r H -, 

7T 

as asserted. 

In Theorem 1, if the variables are uniform over {1, 2, ... , N}, then 7r = 
l/JV;p = l/N 2 , so that cf TV (X(W), Po(A)) < An/N -> if JV > n, where, 
throughout this paper we write for / n ,# n > 0, /„ < g-„ (or g> n > f n ) if 
/n/fi'n - » as n — > oo. If the variables are Geometric(p), then the discussion 
in Section 1 yields ix = p/(2 — p) and p = p 2 / (3 — 3p + p 2 ), so that we get 
d TV (C(W), Po(A)) < 2np/(2 - p) + 2np{2 - p)/(3 - 3p + p 2 ) < 6np -> if 
p <^ 1/n. For the n person game discussed in Section 1, we thus get 

F(W = 0) = P(noties) = exp{-(n(n - l)p)/(2(2 - p))} ± 6np 
= exp{— A} ± 6np, 

1 1 



E(R) 



F{W = 0) e- x ±6np' 
and 

T) 

E(F) = -E(R). 
P 

The random variable W, while providing us with some insight, does not 
yield the level of detail that we desire. For this reason, we turn our attention 
next to the variable Y r that counts the "number of strict r-way ties," or, in 
other words, the "number of boxes with exactly r balls." The development 
that follows is alternative to that provided, say, in [B], Theorems 6.C, 6.E, 
and particularly 6.F, though we do not make too many comparisons between 
our results and those of [6j, since our main focus will be on the multivariate 
Theorem 3; the strategy of looking at specific sets of r players is what sets 
our method apart. 

Letting as before {X,}™ =1 be an integer valued sequence of i.i.d. random 
variables with F(Xi = i) = Pi, we denote by 



the probability that a specific set of r players are involved in a strict tie, and 
thus 
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is the expected number of boxes with exactly r balls. Throughout this paper 
we will employ, as in the previous sentence, the dual analogies of "balls 
in boxes" and "ties between contestants." It may be readily verified that 
7i = (1/N r ~ 1 )(l — N~ l ) n ~ r in the uniform case, and that in the geometric 
case 7i = X/^Li(l — p) rx ~ r p r (l — (1 — p) x ~ 1 p) n ~ r may be estimated as follows; 
the estimates may be seen to be tight provided that p — > 0. First we have 



x=l 

oo 

> J2(l-PY X - r p r [l-{n-r){l-pf- l p] 



x=l 

oo 



5> - P y x - r p r - (n - r) 5^(1 - p)— V(l - Pf-'p 

x=l x=l 

p r ~ 1 2(n — r)p r 
r (r + 1)(2 — rp) 



where the above inequalities follow since (1 — p) r > 1 — rp and (1 — p) r+1 < 
1 - (r + l)p + [(r + l)rp 2 ]/2. Next note that 



vr = 53(1 -p)— vii-a-^rvr' 



x=l 

oo 



rx—rpr 



x=l 



x-i- , (n-r)(n-r- 1) , 2l _ 2J 



1 - (n - r)(l - pf^p + - ^ -(1 - p) 2 ^ 



p r (n — r)p r+1 (n — r)(n — r — 1) p r+2 



l-(l-p) r ' l-(l-p) r+1 2 l-(l-p) 

^ 2p r_1 (n — r)p r (n — r) 2 p r+1 



r+2 



r(2-(r-l)p) r + 1 (r + 2) (2 - (r + l)p) ' 

so that in the geometric case, tt ~ p r ~ 1 /r provided that np( r+1 " 2 — > 0; rp — > 
0. 

We shall use the coupling approach as in [6] to show that C(Y r ) may be 
closely approximated by a Poisson distribution with the same mean. We need 
to first find, given a sum Y^j=i Ij °f indicator variables, a sequence { J^} of 
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indicator variables, defined on the same probability space as the IjS, so that 
for each j, 

C(Jlj, J2j, • • • , Jnj) = I2, ■ ■ ■ 1 In\Ij — 1)- (1) 

Good error bounds on a Poisson approximation are obtained if the JjiS are 
chosen in a fashion that makes them "not too far apart" from the IjS. We 
proceed in a manner similar to that in Theorem 6.F in [B], but the coupling 
we use is conditional, thus imparting a different flavor to the argument: We 
have 



Y r — Ij , 



where Ij = 1 if the jth r-set is engaged in a strict tie. Now we define the 
indicator variable Ij X as being one if and only if Ij—1 and the members of 
the jth r-set all have "value" x. Now we proceed as follows: If Ij X = 1, 
we "do nothing", setting Jj = li for all i. If, however, Ij X = 0, we move all 
members of the jth r set into the arth box (some of these might of course have 
occupied the arth box to begin with), while ejecting all its "illegal" occupants 
and moving each these independently with probability £>&/(! — p x ) t° box 
k; k 7^ x. Finally we set Jj = Jjj X = 1 if the ith r set is involved in a strict 
tie after this interchange. We need to verify that (1) holds in the modified 
form 

£(Jijx, J2jx, • • • , J(^)jx) = h, ■ ■ ■ , Ifn\ \Ij X = 1); (2) 

while this may be viewed as being "obvious," we provide a proof next. To 
show that (2) holds, it clearly suffices to show that any configuration (or 
sample point) corresponding to the members of the jth r-set being "the only 
occupants of the xth box" is equally likely under both the conditional and 
unconditional models in (2). This strategy will achieve more, in fact, since 
we will not have to verify a condition similar to (2) when we move on to the 
multivariate case. 

We let Oj denote the score of the ith player not in the r-clique in question 
(a« 7^ x), and bi the score of the ith player in the r-clique, so that bi = x. 
Now 



P (configuration | Ij X 



P(configuration) 

Wjx = 1) 
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P(ai n— r+1; ■ ■ ■ ; bn) 

P(a 1 )P(a 2 )...P(a„_ r ) 



(l-p x ) n ~ r 

Note also that the probability of the configuration under the coupled model 
is given by 



E(;W-p*r E II K%)llfr 

Z=0 ^ ' SCI n-r je{l,2,...,Ti-r}\S jeS 



Now 



P{a>j 



p( a j)_ 

Px 



(3) 



e ^ n K%onfr7 

SCI, ...,n—r je{l,2,...,fi-r}\S jeS Fx 

n— r / \ / \ t n— r 



(\ n— r r 



n _ r n— r 
n _ r n— r 



which shows that (3) yields the same expression as before. This proves the 
claim. 

Now Theorem 2.B in [B] leads to the following inequality: 

d T v(£(Y r ),Po(A)) 
1 - e~ A 

< 



) E E p fe = x ) { p (^ = x ) + E p ( J * ^ } 
< vr + f 1 ^) E E p ( J ^ = x ) E p ( J * ^ ( 4 ) 

j X i^j 



where A = ~E(Y r ), n = F(Ij = 1), and the coupled sequence {Jj} = {Jij X } 
satisfies (2) for each j and x. Consider first the case P(Jj = 0, = 1), 



10 



which is clearly impossible when \i D j\ > 1, and which we shall call Case I. 
We thus have for =0, 

= 0, J ijx = l)=( n ~ T \ = 0, J ijx = (5) 

where the summand P(Jj = 0, Jij X = l,y) represents the probability that the 
ith r-set is not engaged in a strict r-way tie before the coupling, but is part 
of such a tie with common value y after the coupling. (5) thus yields 

£>(/ i = 0,-7^ = 1) = ^^{P(J^ = l,y)-P(/, = l,J^=l,y)} 

i^j i^j y^x 



n — r 
r 



y^x 



where 

r / \ n— 2r 



1 - - P» 



1 - Pa 

n— 2r 



1 - Pa 



\n— 2r 



(1 -Px 

-^-^E("" s 2r )fe)' (1 -'"- I " ) 



1 



s 

n— 2r 

n— 2r 



V 1 Px / 



ii 



We now check to see the nature of the bound (6) in the uniform and geometric 
cases: When the balls are distributed uniformly in N boxes, we see that (6) 
leads to 



= J ijx = l) 

i¥=3 

n — r 
r 



y^x 



r) N r \ N J [\N-1J \ N 



N j 1 V N 

2r 

<-exp{2r/(iV-l)}A, (7) 



while in the geometric case we have 
^F(Ii = 0,J ijx = l) 



n—2r 



J E^ 1 -PvT~ r ( l -Pv)~ r ((! ~P*T r ~ (1 -PvT) 

' y=l 

< \m^{(l-p x )- r (l-p y )- r -1} 

x,y 

< Aexp{2rp/(1 — p)}2rp. (8) 

We now consider the case where = 1 and Jij X = (Case II). We clearly 
have = 1, J ijx = 0) = P(ii = 1) for \if]j\ > 1 (Case II'), so we obtain 



7T 



|inj|>i 

„2 



^A. (9) 
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Next assume (Case II") that \i D j\ = 0, and we seek to estimate the prob- 
ability P(ij = 1, Jij X = 0). If y — x, we bound P(ij = 1, Jij X = 0) by tt x , so 
that 

(10) 



^ F(I ix = 1, = 0) < ^ ^7r x . 



|anj|=0 

If, on the other hand, y ^ x, then we have 



^ p(/, = i, j ijx = 0) = ( n r r ) P ( 7 * = J ^ = °) 



I in j |=0 x 7 J/^a; 



EE 7 * 



r / V o / y 1 — p x 

the above equation follows since in order for I iy = 1, = to occur, we 
must have at least one of the q "bad" balls present in urn x land in urn y 
and thus "spoil" the fact that I iy — 1. We thus get 

|inj|=0 V 7 J/^a: r:r q>l V ^ 7 

V r ; l-p x ^ 

< (VhV*^ (id 

Now (11) reduces in the uniform case to 

n-r\ n ^ n/N N - 1 < n e" /Ar 



r ) N — 1 N r+1 



N 2 

< _^ Ae ^exp{(n-r)/(iV-l)} 

< ^Ae 2 ™/^- 1 ) (12) 



and is bounded in the geometric case by 



(" r r )^Bl-^ lKr+1) ^ AVe-(l + o(l)), (13) 
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provided that np (r+1 ^ 2 -> 0, rp -> 0. Equations (4), (6), (9), (10) and (11) 
now yield 

^rv(£(y r ),Po(A)) 
1 - e" A 



< 7T 



(Sr~) E E = !) E p ( J * ^ 



^ 7r+ ( 1A l)EE p ( / ^ = 1 

' j X 



+ 



n — r 
r 

n — r 



n-2r 



1 - Pa 



;i-p,r +-a 

1 n 



n — r \ p 



n- 



r 7 1 - p x 



+1 



(14) 



We next evaluate (14) in the uniform case: Equations (4), (7), (9), (10) and 
(12) give 



d TY (C(Y r ), Po(X)) 

- 7r+ ( 1A x)EE p fe = x ) E p ( J > ^ J > 



3 X 



< 71+ ( 1 A - ) A 



2r t 
Aexp{2r/(iV-l)} + -A 

JM n 



it + (A A A 2 ) ■ 



- exp{2r/(iV ~ 1)} + - + ^ + ^ exp{2n/(iV - 1)} j . (15) 
We compare (15) with Equation 6.2.18 in [5J, which yields the upper bound 

d TV (£(r r ),Po(A))<(AAA 2 ){i + ^ + ^}; 
it is evident that (15) provides a better estimate if 

- exp{2r/(iV - 1)} < — + (6 - exp{2n/(iV - !)})_, 
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which is a condition that holds under a wide range of circumstances, and 
certainly if n/N — » 0. Now in the geometric case, Equations (4), (8), (9), 
(10), and (13) reveal that (14) reduces as follows: 



We have thus proved 

Theorem 2 Let {X,}™ =1 be an integer valued sequence ofi.i.d. random vari- 
ables with P(Xi — i) = Pi. Define Y r to be the number of strict r-way ties 
between these random variables. Then the total variation distance between 
C(Y r ) and a Poisson distribution with the same mean is given by (14)- This 
expression reduces to the one in Equation (15) when the distribution of the 
XiS is uniform on {1, 2, ... , N} and to the expression in Equation (16) when 
Xi ~ Geo(p). 

For the rest of the paper we will, for simplicity, restrict our attention to 
the classical occupancy problem of n balls in iV boxes, assuming furthermore, 
that n/N — > 0. The goal is to obtain a multivariate Poisson approximation 
for the vector Z AB = {Y A , Y A+ i . . . , Y B }, for suitably restricted A and B, and 
where the approximating Poisson vector consists of independent components. 
First consider the quantities {A a : A < a < B}. Since 



it follows, due to the fact that n/N — > 0, that A a is monotone decreasing 
in a. Suppose that \ A < oo for some finite A. It then follows that the 




j X 




A, 



a 
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approximating Poisson distribution for Ya+i would have mean close to zero, 
making our agenda somewhat uninteresting. We shall assume therefore that 
Xa —> oo as n, N — > oo. Choices of the parameters that make this occur 
might be, e.g., n = N a ;a < 1, when E(Y a ) — > for all a > Aq, or, more 
interestingly, n = N/\ogN in which case the threshold Aq would tend to 
infinity with N. We thus seek values of A and B for which we get an "in- 
teresting" multivariate Poisson approximation for the ensemble (Y^, . . . , Yg). 
Now Theorem 10. J in [6] yields, using notation suggested by that used in the 
proof of Theorem 2, 



B 



d Ty (C(Y A ,...,Y B ),l[Po(X a )) 

a=A 

B (™) N ( \ 

<EEE p fe = 1 ) \niaj = i)+ E F(i biy ^j biy )\,(i7) 

a=A j=l x=l { biy^ajx ) 

where the last sum does not include the case hi = aj. Correspondingly, we 
let Ti,T 2 ,T 3 denote the quantities 



B \a) N 

EEE p ( J ^ = 1 ) p (^ 

a=A j=l x=l 



and 



B (a) N 

EEE p ( J <^ = x ) E F ^y * J ^ 

a=A j=l x=l iy^jx 
B (a) N {b) N 

E E E p fc = !) E E E p ( J ^ + J *y) 

a=A j=l x=l b^a i=l y=l 

respectively; we need to compute the sum T\ + T 2 + T 3 . First, we see that 

T * = EE p2 (^ = 1 ) 

a 3 

v f n \ 1 ( - - 1 
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^\aN 2 ) 

a=A 

^\2N 2 J 



a=2 

= T7^ (1 + o(1)) ^° (18) 

for each A, B. The computation of T 2 follows as in the proof of Theorem 2. 
The first component, T 2 \ is, by (7), given by 

b 2a 

T ^ = EEE p ( J ^ = 1 )-^ ex p{ 2a /(iv-i)}Aa 

a=A j x 

* E A «|( 1 +°( 1 ))- ( 19 ) 

a 

Under what circumstances might the bound in (19) tend to zero? Let us 
pause to consider this question before continuing. If n = \J N log iV and 
A = 2, then A a ~ (logiV)/2, the error bound of Theorem 2 is of magni- 
tude A/log N/N, and the bound in (19) does approach zero. However in 
this case A3 — > so we are able to derive little useful beyond a Poisson ap- 
proximation for Y2, the number of "days" with exactly two "birthdays". If 
n = jV - 9 , then A a ~ j\r 1-aia — > 00 for all a = 2, 3, ... 9 but the summands 
in (19), asymptotically equal to iV 1_0 - 2a , tend to zero only if a = 6,7,8,9. 
We thus have a potential multivariate approximation for (Y 6 , Y 7 , Y 8 , Y 9 ). Fi- 
nally, let n = N/logN. In this case, A a ~ (e/a\ogN) a ■ N, and, with 
a = log N/ (2 log log N), for example, we see that 

\ a log N ) 

logiV 

2e log log N V log log N < .2 w w n\ ^ 



\ 2 log log JV 



e 21oglogiV^ 2 log log jv 

\og 2 N ' 



logiV 

= (2e log log A^) 21 °s 1 °s JV — > 00, 

while with a = log N/ [(4— e) log log N] (we use a = log N/(3 log log N) below) 
we have 



\ 2 / \ 2a 

■ N 



N \a\ogN 
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21 °^ 21ogiV 
3 log log N \ 3 log log JV 



/3eloglog N\ siostog^ / 3iogiogiv \ 
V log 2 AT J 'l 6 2 J 



2 1ogJV 

3e log log AT\ 3 log log iv 



log 1/2 AT 

which leads, with A = log iV/ (3 log log N) and B = log A^/ (2 log log AQ, to 

21ogJV 

T 21 < 2(fl - ^ < ^ f^^ N ) W _> 0; 



AT " 3 log log AT V log 1/2 AT 

we thus have a potential Poisson approximation for the vector (Ya- ■ - Ye). 
Next note that the term T 22 that corresponds to (9) is given by 

B (™) N 2 2 

a=A j=l a;=l a 

Finally we combine the two remaining terms (10) and (12), as reflected in 
(15), to get 

B (") N // n \ 

T ^3 = EEE p fe = 1 )rr + ^ ex pMiv-i)} 

a=A j=l x=l \ 

a 

Turning to a computation of T 3 , we first observe that for a ^ b and \iC\j\ > 
1, it is impossible for Iu y = 0, Ju y = 1 to occur. Accordingly, as in the 
calculation leading up to (6) we see that 



EEE p ( J ^ = ' J ^ = 1 ) 

b i y 

= E E E w J ^y = x ) - p ( J ^ = J ^ = *)> 



& /L\/l\9/1\ ^~<? 

n — a \ /o\/l\/l N 



E 7 BE 



y q=Q V*/ \ / \ 
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^ / n -a-b\ /J_V /_ 2 \ n - a - b - s { 1 \ b ~ q fN-2Y 



s J \N J V N J V^- 1 / \N-1 

N-lY f±\ b J2 ( n - a - h \ flY /_2\ n - a - b ~ s fN-2 



N J \ N ) , \ s J\ N J \ N ' \N-1 

e( b ;>' 

b 

1 \ b ( N 2\ n ~ a ~ b / N l\ a ( 1 \ b ( N 2 x n ~ a ~^ " 



N-1J \N-1J \ N J \NJ \N-1 

^ Afc (o + 6) (i + o(i)) _ (22) 

b 

As in the univariate case, f(hi y = 1, Ju y = 0) = ¥(I biy — 1) if \i n j| > 1. 
Hence 



E E E p ( J ^ = 1 > J ^ = °) < E 

b |«nj|>l 2/ 6 



71—1 

" 1 b - 1 1 ^ 



(23) 



If, however, \iC\j\ =0, then 

^ Wux = 1, Jwx = 0) < E 7 
6 |inj|=o 6 ^ ' 

and, being rather crude with the final estimation 

E E E p ( J ^ = 1 ' J ^ = °) 

b \inj\=Oy^x 

E(n — a\ 1 ^-^ ^-^ fn — a — b\ 1 q 



(24) 



b N b q ) Ni N - 1 



^e(;)^e(;:;)^+^> 

<E]^V1 + «(!))■ (25) 
Collecting equations (18) through (25), we see that the following holds: 
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Theorem 3 When n balls are randomly assigned to N boxes, where n <C N ; 
the joint distribution of the multivariate vector (1% ■ ■ ■ , Y B ) of exact box 
counts may be approximated by a Poisson vector with independent compo- 
nents. More specifically, 

d TV \ C(Y A , ,Y B )W Po(A a ) J < e n 

,N,A,B 

where X a = E(Y a ) = (^)j^=t (l - ° and e n>N)A ,B is of magnitude 

E L . 9 /2a a 2 1 n \ v-^ , / , / ( a + °) a & 1 n \\ 

a=A v 7 a Vfe^a v 7 / 

In addition, an application, e.g., of Theorem 10. K in JE/ may provide slight 
improvements in the above, through a partial reinstatement of the so-called 
"magic factor". 
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